Segmentation and Analysis of Double-Sided Handwritten Archival Documents
نویسندگان
چکیده
Historical handwritten documents are preserved in good condition in many national archives or libraries. One problem that many archivists are facing is the sipping of ink through the pages of certain double-sided handwritten documents after long periods of storage. This paper addresses this problem and develops a novel algorithm to extract clear textual images from interfering and overlapping areas. With the critical observation that the edges of the sipping strokes from the reverse side are not as sharp as those on the front side, we adopt the edge detection approach to suppress unwanted background patterns. Firstly, an improved Canny edge detector with edge orientation constraint is proposed. These improvements could link more weak foreground edges without introducing noises. Secondly, a new edge expansion model is presented for recovering broken edges of the words or characters on the front side. Finally, the outline of the whole document analysis system is illustrated. The segmentation results of real images are shown and evaluated.
منابع مشابه
Character Extraction from Interfering Background - Analysis of Double-Sided Handwritten Archival Documents
The sipping of ink through the pages of certain double-sided handwritten documents after long periods of storage poses a serious problem to human readers or OCR systems. This paper addresses this problem through the recovery of content on the front side of a page from the interfering image caused by the handwriting on the reverse side. First, by adapting the Gaussian stochastic model, the inter...
متن کاملConnected Component Based Word Spotting on Persian Handwritten image documents
Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...
متن کاملText Extraction from Historical Handwritten Documents by Edge Detection
Many national archives or libraries keep large amount of historical handwritten documents. One problem that many archivists are facing is the sipping of ink through the pages of certain double-sided handwritten documents after long periods of storage. The result is that the handwritten characters from the reverse side appear as noise on the front side and even interfere with the front side char...
متن کاملA Survey on Word Segmentation Method for Handwritten Documents
One of the most important and challenging tasks in a handwritten recognition pipeline is the segmentation of handwritten document images into text lines and words. Several problems inherent in handwritten documents such as the difference in the skew angle between text lines or along the same text line, the existence of adjacent text lines or words touching, the existence of characters with diff...
متن کاملRobust Segmentation of Unconstrained Online Handwritten Documents
A segmentation algorithm, which can detect different regions of a handwritten document such as text lines, tables and sketches will be extremely useful in a variety of applications such as retrieval, translation and genre classification. However, this task is extremely challenging for handwritten documents, which vary considerably in their structure and content. In this paper, we describe a rob...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000